Partial Replica Selection Based on Relevancefor Information

نویسندگان

  • Zhihong Lu
  • Kathryn S. McKinley
چکیده

Partial collection replication improves performance and scalability of a large-scale distributed information retrieval system by distributing excessive workloads, reducing network latency, and restricting some searches to a small percentage of data. In this paper, we rst examine queries from real system logs and show that there is suucient query locality in real systems to justify partial collection replica-tion. We then present a method for constructing a hierarchy of partial replicas from a collection where each replica is a subset of all larger replicas, and extend the inference network model to rank and select partial replicas. We compare our new selection algorithm to previous work on collection selection over a range of tuning parameters. For a given query, our replica selection algorithm correctly determines the most relevant of the replicas or original collection, and thus maintains the highest retrieval eeectiveness while searching the least data as compared with the other ranking functions. Simulation results show that with load balancing, partial replication consistently improves performance over collection partitioning on multiple disks of a shared-memory multiprocessor and it requires only modest query locality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Replica 1 Replica 1 - 1 Replica 1 - p Replica 1 - p - 1 Replica 1 - p - m User Cluster 2 User Cluster 1 User Cluster n Replica Selection

Partial collection replication improves performance and scalability of a large-scale distributed information retrieval system by distributing excessive workloads, reducing network latency, and restricting some searches to a small percentage of data. In this paper, we rst examine queries from real system logs and show that there is suucient query locality in real systems to justify partial colle...

متن کامل

An Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity

The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...

متن کامل

Searching a Terabyte of Text Using Partial Replication

The explosion of content in distributed information retrieval (IR) systems requires new mechanisms in order to attain timely and accurate retrieval of unstructured text. In this paper, we investigate using partial replication to search a terabyte of text in our distributed IR system. We use a replica selection database to direct queries to relevant replicas that maintain query effectiveness, bu...

متن کامل

A PGSA Based Data Replica Selection Scheme for Accessing Cloud Storage System

The data replica management scheme is a critical component of cloud storage system. In order to enhance its scalability and reliability at the same time improve system response time, the multiple data replica scheme is adopted. When a cloud user issues an access request, a suitable replica should be selected to respond to it in order to shorten user access time and promote system load balance. ...

متن کامل

Replica Selection in the Globus Data Grid

The Globus Data Grid architecture provides a scalable infrastructure for the management of storage resources and data that are distributed across Grid environments. These services are designed to support a variety of scientific applications, ranging from high-energy physics to computational genomics, that require access to large amounts of data (terabytes or even petabytes) with varied quality ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999